Data Report — Heart Disease (UCI id 45)

4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach

Source: UCI dataset 45

SemMap JSON-LD: dataset.semmap.json · RDFa HTML

Overview

Metric Value
Dataset Heart Disease (UCI id 45)
Source UCI dataset 45
Rows 297
Columns 14
Discrete 9
Continuous 5
SemMap SemMap JSON-LD
SemMap HTML
Missingness Not modeled

Variables and summary

variable inferred dist
age continuous 54.5421 ± 9.0497 [29, 48, 56, 61, 77]
sex discrete male [1]: 201 (67.68%)
cp discrete Asymptomatic [4]: 142 (47.81%)
Non-cardiac chest pain [3]: 83 (27.95%)
Atypical angina [2]: 49 (16.50%)
Typical angina [1]: 23 (7.74%)
trestbps continuous 131.6936 ± 17.7628 [94, 120, 130, 140, 200]
chol continuous 247.3502 ± 51.9976 [126, 211, 243, 276, 564]
fbs discrete >120 mg/dL [1]: 43 (14.48%)
restecg discrete normal [0]: 147 (49.49%)
LVH (Estes) [2]: 146 (49.16%)
ST-T abnormality [1]: 4 (1.35%)
thalach continuous 149.5993 ± 22.9416 [71, 133, 153, 166, 202]
exang discrete yes [1]: 97 (32.66%)
oldpeak continuous 1.0556 ± 1.1661 [0, 0, 0.8, 1.6, 6.2]
slope discrete upsloping [1]: 139 (46.80%)
flat [2]: 137 (46.13%)
downsloping [3]: 21 (7.07%)
ca discrete 0: 174 (58.59%)
1: 65 (21.89%)
2: 38 (12.79%)
3: 20 (6.73%)
thal discrete normal [3]: 164 (55.22%)
reversible defect [7]: 115 (38.72%)
fixed defect [6]: 18 (6.06%)
num discrete <50% narrowing [0]: 160 (53.87%)
≥50% narrowing [1]: 54 (18.18%)
2: 35 (11.78%)
3: 35 (11.78%)
4: 13 (4.38%)

Fidelity summary

umap model backend disc jsd mean disc jsd median cont ks mean cont w1 mean downstream sign match
metasyn metasyn 0.1149 0.1208 0.1683 3.5644 0.6316
clg_mi2 pybnesian 0.1002 0.0941 0.1604 4.7232
semi_mi5 pybnesian 0.1002 0.0941 0.1604 4.7232
ctgan_fast synthcity 0.4027 0.3651 0.8823 43.3414
tvae_quick synthcity 0.1058 0.1173 0.2518 8.402

Privacy summary

model backend n real n synth exact overlap rate near duplicate rate eps nn distance mean k min k pct lt5 k map rare qi reproduction rate identifiability score delta presence
metasyn metasyn 297 303 0 0.9966 0.0587 1 1 2 0 2
clg_mi2 pybnesian 297 303 0 0.9865 0.0658 1 1 8 0 2.75
semi_mi5 pybnesian 297 303 0 0.9865 0.0658 1 1 8 0 2.75
ctgan_fast synthcity 297 256 0 0.1719 0.3699 1 1 5 0 2.2
tvae_quick synthcity 297 256 0 0.6641 0.1902 1 1 1 0 17

Models

UMAPDetailsStructure

Real data

Model: metasyn (metasyn)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.0977 1.2657
sex discrete 0.1062
cp discrete 0.1343
trestbps continuous 0.1777 3.7632
chol continuous 0.0891 6.297
fbs discrete 0.1098
restecg discrete 0.1403
thalach continuous 0.1937 6.2553
exang discrete 0.1208
oldpeak continuous 0.2833 0.2407
Downstream metrics
metric value
sign_match_rate 0.6316
formula num ~ Q('age') + Q('sex') + Q('cp') + Q('fbs') + Q('trestbps') + Q('chol') + Q('restecg') + Q('thalach') + Q('exang') + Q('oldpeak') + Q('slope') + Q('ca') + Q('thal') + Q('age'):Q('sex') + Q('sex'):Q('cp') + Q('cp'):Q('fbs') + Q('fbs'):Q('trestbps') + Q('trestbps'):Q('chol')
skipped_reason
Privacy metrics
metric value
n_real 297
n_synth 303
exact_overlap_rate 0
near_duplicate_rate_eps 0.9966
nn_distance_mean 0.0587
k_min 1
k_pct_lt5 1
k_map 2
rare_qi_reproduction_rate 0
delta_presence 2
variable distribution
age core.normal
sex core.multinoulli
cp core.multinoulli
trestbps core.lognormal
chol core.lognormal
fbs core.multinoulli
restecg core.multinoulli
thalach core.normal
exang core.multinoulli
oldpeak core.truncated_normal
slope core.multinoulli
ca core.multinoulli
thal core.multinoulli
num core.multinoulli

Model: clg_mi2 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.0838 1.1178
sex discrete 0.0941
cp discrete 0.1516
trestbps continuous 0.1645 3.4709
chol continuous 0.1784 11.0214
fbs discrete 0.1171
restecg discrete 0.126
thalach continuous 0.2135 7.7047
exang discrete 0.1121
oldpeak continuous 0.1617 0.301
Privacy metrics
metric value
n_real 297
n_synth 303
exact_overlap_rate 0
near_duplicate_rate_eps 0.9865
nn_distance_mean 0.0658
k_min 1
k_pct_lt5 1
k_map 8
rare_qi_reproduction_rate 0
delta_presence 2.75

Model: semi_mi5 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.0838 1.1178
sex discrete 0.0941
cp discrete 0.1516
trestbps continuous 0.1645 3.4709
chol continuous 0.1784 11.0214
fbs discrete 0.1171
restecg discrete 0.126
thalach continuous 0.2135 7.7047
exang discrete 0.1121
oldpeak continuous 0.1617 0.301
Privacy metrics
metric value
n_real 297
n_synth 303
exact_overlap_rate 0
near_duplicate_rate_eps 0.9865
nn_distance_mean 0.0658
k_min 1
k_pct_lt5 1
k_map 8
rare_qi_reproduction_rate 0
delta_presence 2.75

Model: ctgan_fast (synthcity)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.9667 20.1587
sex discrete 0.4136
cp discrete 0.5973
trestbps continuous 0.9833 35.4833
chol continuous 1 116.9667
fbs discrete 0.048
restecg discrete 0.7657
thalach continuous 0.8495 41.4656
exang discrete 0.3378
oldpeak continuous 0.612 2.6329
Privacy metrics
metric value
n_real 297
n_synth 256
exact_overlap_rate 0
near_duplicate_rate_eps 0.1719
nn_distance_mean 0.3699
k_min 1
k_pct_lt5 1
k_map 5
rare_qi_reproduction_rate 0
delta_presence 2.2

Model: tvae_quick (synthcity)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.1956 2.8816
sex discrete 0.1173
cp discrete 0.1046
trestbps continuous 0.2924 6.3688
chol continuous 0.3034 24.9184
fbs discrete 0.002
restecg discrete 0.1584
thalach continuous 0.2227 7.4954
exang discrete 0.0933
oldpeak continuous 0.2448 0.3459
Privacy metrics
metric value
n_real 297
n_synth 256
exact_overlap_rate 0
near_duplicate_rate_eps 0.6641
nn_distance_mean 0.1902
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 17